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Abstract The Markov chain Monte Carlo update method to construct an irreversible 
kernel has been reviewed and extended to general state spaces. The several conver- 
gence conditions of the Markov chain were discussed. The alternative methods to 
the Gibbs sampler and the Metropolis-Hastings algorithm were proposed and as- 
sessed in some models. The distribution convergence and the sampling efficiency 
are significantly improved in the Potts model, the bivariate Gaussian model, and so 
on. This approach using the irreversible kernel can be applied to any Markov chain 
Monte Carlo sampling and it is expected to improve the efficiency in general. 



1 Introduction 

The Monte Carlo method for high-dimensional problems has a wide variety of appli- 
cations as an interdisciplinary computational tool. There are many interesting topics 
on strongly correlated systems in physics, e.g., the phase transitions, where the di- 
mension of the state space for capturing the essential physics is often more than 
hundreds of thousand. The Markov chain Monte Carlo (MCMC) method that over- 
comes the curse of dimensionality has been effectively applied to many problems 
in high dimensions. Instead of the curse of dimensionality, the MCMC method suf- 
fers from the sample correlation. Since the next configuration is generated (updated) 
from the previous one, the samples are not independent of each other. Then the cor- 
relation gives rise to two problems; we have to wait for the distribution convergence 
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(equilibration) before sampling, and the number of effective samples is decreased. 
The former convergence problem is quantified by a (total variation) distance to the 
target distribution |27ll and the spectral gap of the transition kernel. As an assess- 
ment for the latter problem, the decrease of the number of effective samples, the 
integrated autocorrelation time is defined as 



where (9, is an observable at the z'-th Monte Carlo step, and C(t) is ideally inde- 
pendent of i after the distribution convergence. This autocorrelation decreases the 
number of effective samples roughly as M e ff ~ M/(l + 2T; nt ), where M is the total 
number of samples in simulations. Although an MCMC method satisfying appro- 
priate conditions guarantees correct results asymptotically in principle ifTTl . vari- 
ance reduction of relevant estimators is crucial for the method to work in prac- 
tice. If the central limit theorem holds, the variance of expectations decreases as 
v/M ~ var(/)/M e ff, where v is called the asymptotic variance that depends on the 
integrand function and the update method through the autocorrelation time. 

We should care the following three key points for efficient update in the MCMC 
method. The first is the choice of the ensemble. The extended ensemble meth- 
ods, such as the multicanonical method and the replica exchange method, have 
been proposed and applied to the protein folding problems, the spin glasses, etc. 
The second is the selection of candidate configurations. The cluster algorithms, 
e.g., the Swendsen-Wang algorithm, can overcome the critical slowing down (the 
correlation-time growth in a power-law form of the system size) by taking advantage 
of mapping to graph configurations in many physical models. The hybrid (Hamilto- 
nian) Monte Carlo method |5 j performs a simultaneous move, where the candidate 
state is chosen by the Newtonian dynamics with an artificial momentum. The third 
is the determination of the transition kernel (probability), given candidate configu- 
rations. We focus our interest on this kernel construction problem in this paper. 



2 Reversibility of Transition Kernel 

We will mention the reversibility of the transition kernel of Markov chain and some 
conventional approaches to the construction of an irreversible kernel in this section. 
In the MCMC method, the ergodicity of the Markov chain guarantees the consis- 
tency of estimators; the Monte Carlo average asymptotically converges in probabil- 
ity. The total balance, that is to say, the invariance of the target distribution is usually 
imposed although some interesting adaptive procedures have been proposed these 
days. Since the invention of the MCMC method in 1953, the reversibility, namely 
the detailed balance, has been additionally imposed in most practical simulations 
as a sufficient condition of the total balance. Every elementary transition is forced 
to balance with a corresponding inverse process under the detailed balance. Thanks 



(=1 t=\ 



Ec(/) = £ 



(Oj+tOj) - (Q) 2 
(0 2 )-(0)2 



(1) 



General Construction of Irreversible Kernel in Markov Chain Monte Carlo 



3 



to this condition, it becomes practically easy to find qualified transition probabil- 
ities in actual simulations. The standard update methods, such as the Metropolis 
(Metropolis-Hastings) algorithm |[T6l [9] and the heat bath algorithm (Gibbs sam- 
pler) (8 ), satisfy the reversibility. The performance of these seminal update methods 
has been analytically and numerically investigated in many papers. 

There is a simple theorem about the reversible kernel as a guideline for the op- 
timization of the transition matrix. Now we consider a finite state space and define 
an order of the matrices as P2 > Pi for any two transition matrices if each of the 
off-diagonal elements of P2 is grater than or equal to the corresponding off-diagonal 
elements of Pi . The following statement is Theorem 2.1.1 of Ref. ||2T1 . 

Theorem 1 (Peskun). Suppose each of the irreducible transition matrices P\ and 
P2 is reversible for a same invariant probability distribution %. If P2 > Pi, then, for 
any f, 

v(/,Pi,7T)>v(/,P 2 ,7r), (2) 

where 

v(f,P,n)= limMvar(/ M ), (3) 

M— j-oo 

andlM = Y?f=\ f( x i) /M is an estimator of I = E 7[ (f) using M samples, x\ ,X2, ...,im, 
of the Markov chain generated by P. 

According to this theorem, a modified Gibbs sampler called the "Metropolized 
Gibbs sampler" was proposed 171 1131 . By the usual Gibbs sampler, we choose the 
next state with forgetting the current state. By the Metropolized version, on the other 
hand, a candidate is chosen except the current state and it will be accepted/rejected 
by using the Metropolis scheme. The resulting transition probability from state / to 
j is expressed as pf_^j = rnin[7t// (1 — Tti), 7tj/ (1 — nf)) V7 7^ j, where %i is the target 
measure of state i. This modified Gibbs sampler is reduced not to the usual Gibbs 
sampler but to the Metropolis algorithm in the case where the number of candidates 
is two. Here the Peskun's theorem says the following theorem ll7l [T3l . 

Theorem 2 (Frigessi-Hwang-Younes-Liu). Let P° and P MG be as the transition 
matrix determined by the Gibbs sampler and the Metropolized Gibbs sampler for a 
same invariant distribution %, respectively. Then, for any non-constant function f, 

V (f,P G ,7C)>v(f,P MG ,n). (4) 

Moreover, also an iterative version of the Metropolized Gibbs sampler was pro- 
posed [7|. What we have learned from the Peskun's theorem is the guideline that 
the rejection rate (the diagonal elements of the transition matrix) should be mini- 
mized in general. As we see, some optimizations of the transition matrix have been 
proposed within the detailed balance. 

However, the reversibility is not a necessary condition for the invariance of the 
target distribution. The sequential update where the state variables are swept in a 
fixed order breaks the detailed balance but can satisfy the total balance 1151 . In the 
meanwhile, some modifications of reversible chain into an irreversible chain have 
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been proposed so far. One example is a method of the duplication of the state space 
with an additional variable, such as a direction on an axis [4, 29]. The axis can be a 
combination of the state variables, the energy (cost function), or any quantity. The 
extended version with the multi-axes (fibers) have been applied to some physical 
models (6). Also the artificial momentum in the hybrid Monte Carlo method (5) 
performs partly as a direction in the state space. A similar idea with the addition 
of a direction has been proposed 1201 . where the next state in the Markov chain is 
generated depending on not only one step before but also the two (several) steps 
before. Then the resulting Markov chain can be irreversible because the history of 
the states has the direction. As other approaches, inserting a probability vortex in 
the state space was discussed (23), an asymmetric choice of the heading direction 
was applied in a hard-sphere system (3), and a global optimization of the transi- 
tion matrix was discussed ifTOl . As seen above, the role of the net stochastic flow 
(irreversible drift) has caught the attention fTD . Note that the hybrid Monte Carlo 
method seems to break the detailed balance in the extended state space, but it is not 
essential because the additional update of the artificial momentum easily recovers 
the reversibility. A more significant breaking of the reversibility in the method can 
be achieved by applying the methods we will explain in this paper. 

Most of the irreversible chains were based on the reversible update methods, such 
as the Metropolis algorithm. Recently, however, a new type of method breaking 
the detailed balance was invented by Suwa et al. (25), which applies a geometric 
approach to solve the algebraic equations. We will briefly review this algorithm for 
discrete variables in the next section and extend it to continuous spaces generally 
later. 



3 Geometric Allocation 

We will introduce the geometric allocation approach for the optimization problem 
of the transition matrix in this section. In the MCMC method, the configuration is 
locally updated, and the huge transition kernel or matrix is implicitly constructed 
by the consecutive local updates. Following Ref. 0251 . let us consider a local update 
of a discrete variable as an elementary process. Now we have n next candidate con- 
figurations including the current one. The weight of each configuration is given by 
wi (i = l,...,n), to which the target probability measure 7tj is in proportion. We in- 
troduce a quantity vij :~ WiPi^j that corresponds to the amount of (raw) stochastic 
flow from state ; to j. The law of probability conservation and the total balance are 
expressed as 

// 

Wi = x 'ij ( 5 ) 
7=1 

11 

w i = L V iJ ( 6 ) 

1=1 
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respectively. The average rejection rate is written as Y,i v ii/lLi w i- ^ i s easily con- 
firmed that the Metropolis algorithm with the flat proposal distribution gives 

Vij = — min [\Vi ,wj] i^j, (7) 



and the heat bath algorithm (Gibbs sampler) does 

WiWj 



Vij. (8) 



The both satisfy the above conditions ([5]) and ||6). The reversibility is manifested by 
the symmetry under the interchange of the indices: vy = Vy; 

The aim here is to find a set {vy} that minimizes the average rejection rate (the 
diagonal elements of the transition matrix) under the conditions (0) and (O accord- 
ing to the sense of Theorem[T] The procedure for the task can be understood visually 
as weight allocation where we move (or allocate) weight (vy) from z'-th state to j- 
the box with the entire shape of the weight boxes kept intact (Fig.[T|i. The proposed 
algorithm in Ref. Il25l is described as Alg.Q] By this procedure (the index 1 is such 
that w\ has the maximum weight), {v,^ ; } are determined as 

v,_y = max(0, min(4, 7 , w ; +wj - Ay, w u wj)), (9) 

where 

4,y:=S,-Sy_i+wi 1 <ij<n (10) 

Si:=Y,w k \<i<n (11) 

k=\ 

So := S n . (12) 

It satisfies the conditions (f5]) and (O, but breaks the reversibility; for example, V12 > 
but \'2i —0 as seen in Fig.Q] As a result, the self-allocated weight that corresponds 
the rejection rate is expressed as 

(max(0,wi-Z? =2 Wi) i=l 
That is, a rejection-free solution is obtained if the condition 

m < ^ = \ j^wk (14) 

L L k=\ 

is satisfied. When it is not satisfied, the maximum weight has to be assigned to 
itself since it is larger than the sum of the rest. Thus, this solution is optimal in the 
sense that it minimizes the average rejection rate. Furthermore, the rejection rate 
expression ( fT3l l provides us a clear prospect; the rejection rate is certainly reduced 
as the number of candidates is increased. This idea was used to a quantum physical 
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Metropolis heat bath present 

Fig. 1 Comparison of the Suwa-Todo algorithm 1251 (present) with the conventional algorithms 
from the view of the geometric allocation. The proposed method accomplishes the rejection-free 
update in this case. This figure was taken from the reference. 



Algorithm 1 Suwa-Todo Algorithm for Irreversible Kernel Construction 

(i) Choose a configuration with maximum weight among the candidates. If two or more config- 
urations have the same maximum weight, choose one of them. In the following, we assume w\ 
is the maximum without loss of generality. The order of the remaining weights does not matter. 

(ii) Allocate the maximum weight W\ to the next box = 2). If the weight still remains after 
saturating the box, reallocate the remainder to the next (i = 3). Continue until the weight is all 
allocated. 

(iii) Allocate the weight of the first landfilled box (wt) to the last partially filled box in step (ii). 
Continue the allocation likewise. 

(iv) Repeat step (iii) for W3, W4, —,M> n . Once all the boxes with i > 2 are saturated, landfill the 
first box (/ = 1 ) afterward. 



model and the rejection rate was indeed reduced to zero 12511 , The application to 
continuous variables will be discussed in Sec. [7] 



4 Ergodicity 



We will discuss the ergodicity of the Markov chain |27l [TTl constructed by the 
update introduced in the previous section. Let G be a finite graph, V be the set of 
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vertices of G, and So be a finite set (colors). Let x be a | V |-tuple state, x = (x v ) ve y, 
x-o G So, and S be the finite state space as S = {x}. We assume that an unnormalized 
density function (weight) w on S, which is in proportion to the target measure, can 
be computed for each x. Let us set an initial state x^ G S and update the state 
repeatedly by the following procedure. 

1. Choose a vertex V G V by a density distribution function q s.t. q(v) > Vl) G V. 

2. Update the state x v by using the transition probabilities determined from Alg.Q] 
(or Eq.|9]l, given the all other local states x~v = ( x s)v^sev- 

Here the order of the configurations in the allocation process is arbitrary as men- 
tioned in Alg. Q] but it is assumed to be fixed in advance. The probabilities, then, 
need to be calculated only once and the CPU time can be reduced in practical sim- 
ulations. Let us call this procedure as the Suwa-Todo in random-order update algo- 
rithm. Then the following theorem holds. 

Theorem 3. The Markov chain constructed by the Suwa-Todo in random-order up- 
date algorithm for a finite state space is irreducible. 

Proof. Given the other local states x- v , Let us define n = \So\, a n x n local transi- 
tion matrix P = P(x- V ) determined by Alg. [U and the matrix element P„^b corre- 
sponding to the transition probability from state a G So to b G So- First we show the 
following lemma. 

Lemma 1. The transition matrix made by Alg. [7J« irreducible. 

Proof (Lemma\T). Let us consider a consecutive update by P. Then the state vari- 
able x v will be cyclically updated. Let c 6 So be the candidate state with maximum 
weight. The cyclic order by the multiplication of P certainly returns back to c be- 
cause it has the maximum weight; Va £ So,N 9 3mi < n s.t. P'aXc > 0. In other 
words, we see that the cyclic loop starts from c and ends at c. Thus any state can be 
visited from c; Vb € So,N 9 3m2 < n s.t. P f^, > 0. Setting m = m\ +1112 < In, we 
show the irreducibility: Ma,b e So, 3m G N s.t. P"^ h > 0. □ 

Let P be the whole transition matrix corresponding one Monte Carlo step. Let us 
consider a update process from state y £ S to z G S. If we choose a vertex Ui by 
probability q(vi) at appropriate M\ times in a row, we can set y Vl = z Vl according 
to Lemma [T] Similarly, if we choose a vertex u, at M, times in a row, we can set 
yvi = Zvj for i = 1,2, |V|. Thus, setting M = £;M;, we show the irreducibility: 
Vy,z € S,3M G N s.t. P^ z > 0. □ 

In a finite state space, an irreducible and aperiodic Markov chain is ergodic ifTTll . 
If a rejection process exists, the chain is aperiodic, so it becomes ergodic. On the 
other hand, when there is no rejection over all situations, the aperiodicity is tricky. 
For example, when n = 3 and w\ = W2 + W3, the period of the local transition kernel 
(matrix) becomes 2. If such a special relation and a same non-zero period exist 
for all local transition matrices, the chain is periodic. It is not trivial to clarify the 
necessary condition for the aperiodicity. Here we show a sufficient condition that is 
satisfied in most cases. 
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By a consecutive update for a same local state variable, it will be cyclically up- 
dated on the loop as oq — >• ci\ —> ■ ■ ■ — >• at (a, £ SqJ = 0, 1, ...,£) s.t. «o = ag for 
£ = 1, or «o = a ( a j(j = 1,2, ...,£— 1) for £ > 2. Let us call the number of steps 
£ as the period of the cyclic loop. Then let us say that a local transition matrix P is 
ergodic if 3Nq € N,Va,& € SnVfc > No s -t- > 0- ^ n most cases ' a local transition 
matrix that has more than two periods of cyclic loop exists. Here, let us remem- 
ber the simple property of the coprime numbers: If natural numbers p and q are 
coprime, Mk > (p — l)(q — l),3a,b 6 N°s.t. k = ax p + bx q. Thus, if a pair of 
two coprime periods (p and q) exists, the local transition matrix is ergodic because 
No = (p — l)(q — 1) + 2(p m — 1 ), where p m is the maximum of the period, satisfies 
the condition. Then the following theorem holds. 

Theorem 4. If there is an ergodic local transition matrix, the Markov chain by the 
Suwa-Todo in random-order update algorithm is ergodic. 

Proof. Because of the irreducibility, there is a positive transition probability from 
any state y e S to / £ S where the ergodic local transition matrix is used. Let M\ be 
an integer s.t. Pr\^i > 0. In a similar way, for any state z <E S, let M? be an integer 

s.t. Py^ z > 0. Let No be an integer that is defined in the local transition-matrix 
ergodicity. Then, setting an integer Mq = M\ +Nq +M2, we see 3Mo <G N,Vy,z <S 
S Vk> Mq s.t. Py^ z > 0. The ergodicity of the Markov chain is shown from this 
condition and the Perron-Frobenius theorem. □ 

We have considered choosing a vertex to update randomly. As a simple way, 
we can choose one uniformly randomly. In many actual simulations, however, the 
sequential choice of the vertices by a fixed order is used. It is because the correlation 
time becomes empirically shorter. In fact, the rejection rate in the sequential |V|- 
vertex update (one sweep) gets smaller than that in the uniformly random \ V | -vertex 
update l22l . However, it is far from trivial to prove the ergodicity for the sequential 
update. Even if the simple Metropolis algorithm is used for the local update, the 
ergodicity can be violated in some cases, such as in the Ising (binary) model on 
the 2x2 square lattice (graph) lfT31 . Although we can practically check the validity 
by comparing it with the random-order update, the clarification of the ergodicity 
condition in the sequential update is an interesting future problem. 



5 Benchmark in Potts Model 

In order to assess the effectiveness of the Suwa-Todo (ST) algorithm l25l . we in- 
vestigate the convergence and the autocorrelations in the ferromagnetic g-state Potts 
model on the square lattice ll30l : the local state at vertex (site) k is expressed as Cfy 
that takes an integer (1 < Cfy < q) and the cost function (energy) is expressed as 
H = — 8oj(jj> where is a pair of connected vertices i and j on the graph 
(lattice). This system exhibits a continuous (q < 4) or first-order (q > 4) phase transi- 
tion at T = l/ln(l + Jq). We calculate the square of order parameter BH (structure 
factor) for q = 4, 8 by the several algorithms. 
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Fig. 2 Convergence of the order parameter (square root of the structure factor) in the ferromagnetic 
4-state Potts model on the square lattice with L = 32 at the critical temperature. The horizontal 
axis is the Monte Carlo step. The simulation starts with the ordered (all "up") state. The rejection- 
minimized (ST) sampler achieves the fastest convergence. The error bars are the same order with 
the point sizes. 



The order parameter convergence (equilibration) is shown in Fig. [2] where the 
simulation starts with a fully-ordered (all "up") state and the local variables are 
sequentially updated by the several algorithms. The square lattice with L = 32 on 
the periodic boundary condition and the critical temperature T = 0.9102392266 
are used. The Metropolis algorithm, the heat bath algorithm (Gibbs sampler), the 
Metropolized Gibbs sampler [13], the iterative Metropolized Gibbs sampler 0, the 
optimal average sampler IflOl , and the ST algorithm (Alg. [TJ |24l are compared. 
The validity of the all update methods are confirmed by comparing the asymptotic 
estimator convergence with each other (the Markov chain by the Gibbs sampler 
is ergodic). The ST algorithm accomplishes the fastest convergence of the quan- 
tity (square root of the structure factor). This acceleration implies that the locally 
rejection-minimized algorithm reduces the second largest eigenvalue of the whole 
transition matrix in absolute value and increases the spectral gap of the Markov 
chain. 

In Fig. [3] it is clearly seen that the ST sampler significantly reduces also the 
autocorrelation time for q = 4, 8 in comparison with the conventional methods. The 
autocorrelation time Tj nt is estimated through the relation: a 2 ~ (1 + 2Ti nt )(7Q , where 
Oq and a 2 are the variances of the estimator without considering autocorrelation 
and with calculating correlation from the binned data using a bin size much larger 
than the T; n t lfT2l . respectively. In the 4 (8)-state Potts model, the autocorrelation 
time becomes nearly 6.4 (14) times as short as that by the Metropolis algorithm, 
2.7 (2.6) times as short as the heat bath algorithm, and even 1.4 (1.8) times as short 
as the iterative Metropolized Gibbs sampler. We investigated also the dynamical 
exponent of the autocorrelation time at the critical temperature. Unfortunately, the 
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Fig. 3 Autocorrelation time of the square of order parameter near the transition temperature 
(T ~ 0.910 and 0.745, respectively) in the 4-state (left) and 8-state (right) Potts models by sev- 
eral methods. The system size is 16 x 16. In the both cases, the ST algorithm realizes the shortest 
autocorrelation time. 



locally optimized method does not reduce the exponent. The factor over 6, however, 
is always gained against the Metropolis algorithm for all system sizes. 

As we have seen, the ST sampler based on the geometric allocation approach l24l 
boosts the convergence and reduces the autocorrelation time of the relevant estima- 
tor (order parameter). This update method will be effective in not only the Potts 
model but also the many kinds of systems. 



6 Alternative to Gibbs Sampler 

We will mention methods constructing an improved kernel for general state space 
that is not finite. Let us consider updating a continuous variable here. When the 
inversion method is applicable, the variable is updated by the heat bath algorithm 
(Gibbs sampler) usually; a uniformly random variable r <S [0,1] is generated and 
the next state is determined from the inverse function of a conditional (cumulative) 
distribution. When the inversion method cannot be applied, on the other hand, a 
candidate state is chosen from a proposal distribution and accepted/rejected by the 
Metropolis algorithm usually. In this situation, the inevitable rejection will be a 
bottleneck for sampling. For continuous variables, it is not possible to apply directly 
the allocation algorithm we mentioned because the measure of each state is zero. 
We can, nevertheless, improve the efficiency for both cases by extending the idea of 
breaking the detailed balance. First, we introduce an improved sampling that is an 
alternative method to the Gibbs sampler in this section. Then we will mention more 
general cases in the next section. 
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Fig. 4 Picture of the (cumu- 
lative) distribution shifts. The 
algorithm for the irreversible 
kernel is corresponding to 
the shift in the maximum 
weight (w'i ) as shown in the 
midst. If we shift in a larger 
value, we could get another 
rejection-free kernel as in the 
right bar. 



W4 
W 3 



I 



shift shift 



Let us review the allocation algorithm we introduced in Sec. [3] We start at the 
configuration with the maximum weight and allocate the weight to the next, as 
Alg. Q] This procedure can be also represented by shifting each position in the max- 
imum weight on the (cumulative) distribution. We compare the shifted distribution 
to the original (non-shifted) one as Fig.|4]and assign the next position (state) in the 
end. It is possible to set the amount of shift any value. If there is a self-allocation as 
a result of the shift, this amount corresponds to the rejection rate. It is obvious that 
the amount of shift that can avoid the self-allocation is not unique in the figure; the 
rejection-free kernel can be constructed as long as the amount of shift is such that the 
maximum weight has no overlap with its original position. For continuous variables, 
we can set the start point of allocation (the amount of shift) at our disposal. 

In order to explain the shift in a continuous case, let us consider the bivariate 
Gaussian distribution as a simple example: 

P(xi,x 2 )°<e ~^ . (15) 

Given x 2 , the local variable x\ is updated by using the conditional (cumulative) 
distribution ^ 

F(xi\x 2 ) =j ' P(x,x 2 )dx. (16) 

The Gibbs sampler determines the next state as 

A=F-\r), (17) 

where r € [0, 1] is an uniformly (pseudo) random variable. This process satisfies the 
detailed balance. 

The overrelaxation method [ 1 1 has been known as one of the best ways to update 
the Gaussian variables. The name "overrelaxation" comes from an idea to make 
the Markov chain to have negative correlation. In this method, for generation of 
a variable from a conditional Gaussian distribution P(zt \ •) ~ iV(/x,-,of ), the next 
state is chosen as z\ = \l\ + Oi(zi — JU;) + <T,Vl — « 2 V, where v is a random variable 
generated from N(0, 1) and a is a parameter (— 1 < a < 1). The extended approach 
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Fig. 5 Trajectories of configurations updated by the Gibbs sampler (left) and by the present ir- 
reversible algorithm with c = 0.4 and w = 0.1 (right) in the bivariate Gaussian distribution with 
o~i = 1 and o"2 = 10. The ellipsoidal line is the three-sigma line of the Gaussian distribution. The 
upper figures show the update procedures of each algorithm. This figure was taken from 1261 . 



called ordered overrelaxation was also proposed |fl9l ; after some candidates are 
generated and ordered, the next state is chosen on approximately opposite side from 
the current position. 

Now, as an another update method (26), let us choose the next state as 

x\ =F-\{F(x l \x 2 )+c + wu}) 1 (18) 

where x\ is the current state, c and w is a positive real parameter with c > w, and u 
is an uniformly random variable in [—1, 1], respectively. The symbol {a} takes the 
fractional portion of a real number a. If we use c = w = 1/2, this process is nothing 
but the Gibbs sampler. On the other hand, when c ^ 0, 1 /2, it does not satisfy the 
detailed balance and there is a net stochastic flow. This flow can push the configu- 
ration globally as in Fig. [5] As the result, the autocorrelation time of (x\ + x 2 ) 2 is 
significantly reduced as shown in Fig. [6] In this figure, the Gibbs sampler, the over- 
relaxation methods with a = —0.86, the ordered overrelaxation method (with 10 
candidates), and the present update method with c = 0.4, w = 0.05 are tested. The 
irreversible kernel produces the smallest correlation over Oij <3\ > 50 and achieves 
about 50 times as short correlation time as the Gibbs sampler. We can surely find a 
better parameter sets of the present algorithm than the best parameter of the conven- 
tional overrelaxation methods in the almost whole region. About the convergence 
condition, it is easy to show the Markov chain of the present algorithm with the 
random-order update is ergodic in a similar way to Theorem[4] 
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Fig. 6 Autocorrelation times of (x\ +JC2) 2 in the bivariate Gaussian distribution by using the Gibbs 
sampler (triangles), the overrelaxation (circles) with a = —0.86, the ordered overrelaxation (dia- 
monds) with the number of candidates 10, and the present method with c = 0.4 and w = 0.05 
(squares). The horizontal axis O2/O1 is corresponding to the sampling difficulty. The statistical 
errors are the same order with the point sizes. 



7 Beyond Metropolis Algorithm 

We will explain, in this section, that it is possible to significantly reduce the rejection 
rate for general cases. When the direct inversion method as in the previous section 
cannot be applied, we resort to the Metropolis algorithm usually, where a candidate 
is generated and accepted/rejected according to the weight and proposal probability 
ratio. It has been a canonical algorithm for the MCMC method since the invention in 
1953 |T6l . However, the inevitable rejection often obstructs the efficient sampling. 
When the number of candidates is two, the Metropolis algorithm achieves the mini- 
mized rejection rate that is easily proved by the geometric picture we introduced in 
Sec. [3] In order to reduce the rejection rate, we must prepare more candidates than 
two. As an alternative to the simple Metropolis algorithm, some methods have been 
proposed so far. An example is the multipoint Metropolis methods, where after gen- 



Fig. 7 Multi-proposal ex- 
ample for n = 4. At first, a 
hub (pivot) is chosen from 
the current position x. Then, 
candidates x', x", x'" are gen- 
erated from the hub. The dot 
line shows the 1-sigma line of 
the Gaussian distribution as a 
proposal example. 
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Fig. 8 Rejection rates (upper) and the correlation times of [x\ + X2) 2 (lower) by the simple 
Metropolis algorithm and the rejection-minimized method for n — 3,4,5 in the wine-bottle po- 
tential U9t with h = 16. The rejection rate is definitely reduced as the number of candidates is 
increased. Accompanying the rejection rate, the correlation time gets shorter. 



erating some candidate states the next configuration is stochastically chosen with 
the detailed balance kept. See references |[T4ll28l . 

We can apply the rejection-minimized (ST) algorithm after creating some candi- 
date states. Let us consider sampling from the wine-bottle (Mexican-hat) potential: 

^-<(^+^)(^ Sa £f e -»K)-<'» 

Then we propose a candidate configuration by the isotropic bivariate Gaussian dis- 
tribution q{Ax\ ,Ax2) x exp(— {Ax\) 2 — (Ax2) 2 )- Here, we try to make some propos- 
als. If we propose candidates from the current position and naively make a transition 
matrix (probability) taking into account only the weight p, the total balance is bro- 
ken. It is because we have to consider also the proposal probability q. Avoiding 
this computational cost, we use the following multi-proposal strategy for n candi- 
dates Jl8l : 

1 . A configuration is chosen as a hub (pivot) from the current configuration by a 
proposal distribution. 

2. (n — 1) candidates are generated by using the same proposal distribution with 
process 1 from the hub. 
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3. The next state is chosen among the « candidates (including the current state) 
by using the transition probabilities taking into account only the weights of the 
states. 

This procedure example is depicted as in Fig. [7] In the process 3, we can make 
the rejection rate minimized by applying the ST algorithm. Figure [8] shows that 
the rejection rate is indeed reduced by using this multi-proposal algorithm and the 
irreversible kernel. The correlation time of {x\ +xz) 2 also gets shorter as the number 
of candidates is increased. 



8 Conclusion 

We reviewed the update method to construct the irreversible kernel for a finite state 
space and showed the several conditions for the irreducibility and the ergodicity. 
The extended algorithm that is an alternative to the Gibbs sampler or the Metropo- 
lis algorithm was shown and assessed, compared with the conventional methods. 
Although the examples we used here are simple cases, the approach using the ir- 
reversible kernel is applicable to any MCMC sampling. The performance of the 
irreversible kernel and the efficient flow structure in a state space need to be inves- 
tigated further analytically and numerically in the future. 
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